Please ensure you have the following packages before knitting:
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE, message = FALSE) #suppresses warnings in the knit
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(httr)
library(jsonlite)
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:purrr':
##
## flatten
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:httr':
##
## config
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:dplyr':
##
## combine
##
## The following object is masked from 'package:ggplot2':
##
## margin
set.seed(1)
The following report will look at the number of cases and deaths during the COVID-19 pandemic, focusing on the United States and general global numbers. The main dataset will comprise of data from the Johns Hopkins University repository, accessed through Github.
Through this analysis, we hope to identify the rate of infections and deaths, as well as the areas most affected by the COVID-19 pandemic.
url_in <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/refs/heads/master/csse_covid_19_data/csse_covid_19_time_series/"
file_names <- c("time_series_covid19_confirmed_global.csv", "time_series_covid19_deaths_global.csv", "time_series_covid19_confirmed_US.csv", "time_series_covid19_deaths_US.csv")
urls <- str_c(url_in, file_names)
global_cases <- read_csv(urls[1])
global_deaths <- read_csv(urls[2])
us_cases <- read_csv(urls[3])
us_deaths <- read_csv(urls[4])
The next bit of data will come from the Centers for Disease Control and Prevention (CDC), specifically the COVID deaths grouped by state by sex for various age groups.
api_url <- "https://data.cdc.gov/resource/9bhg-hcku.json?$limit=200000"
stateDeaths_w_age_sex <- fromJSON(content(GET(api_url), "text"), flatten = TRUE)
Lastly, the political party by state for the US from their 2020 election year. The better dataset for the predictive model, seen later on, would be the population spread by age. However, the data from the United States Census database was too difficult to process for this project. For now, the political affiliation will act as an extra feature.
state_PolParty <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vS3Z8Rq9xqOLISwoKdK0n6CFLBuPSCoXbbLeY8vhi-rzFS3ZFNEtR0BCdEbHcS-2Tlh5aPcnZbwBLao/pub?output=csv")
For US Cases and US Deaths, remove unnecessary columns 1-11 and 1-12 respectively. Rename Admin2 to County, iso2 to Country_Short (country abbreviation), fips to FIPS (county code), and cases to deaths in the US Deaths dataset. Lastly, convert the “dates” column from character to date type.
us_cases <- us_cases %>%
pivot_longer(
cols = !c(1:11),
names_to = "dates",
values_to = "cases",
values_transform = as.numeric
) %>%
select(-UID, -(iso3 : code3), -(Country_Region : Combined_Key)) %>%
rename(County = Admin2, Country_Short = iso2, fips = FIPS) %>%
mutate(dates = mdy(dates))
us_deaths <- us_deaths %>%
pivot_longer(
cols = !c(1:12),
names_to = "dates",
values_to = "cases",
values_transform = as.numeric
) %>%
select(-UID, -(iso3 : code3), -(Country_Region : Combined_Key)) %>%
rename(County = Admin2, Country_Short = iso2, fips = FIPS, deaths = cases) %>%
mutate(dates = mdy(dates)) %>%
filter(!Population %in% c(0), !is.na(Population))
summary(us_cases)
## Country_Short fips County Province_State
## Length:3819906 Min. : 60 Length:3819906 Length:3819906
## Class :character 1st Qu.:19077 Class :character Class :character
## Mode :character Median :31012 Mode :character Mode :character
## Mean :33043
## 3rd Qu.:47130
## Max. :99999
## NA's :11430
## dates cases
## Min. :2020-01-22 Min. : -3073
## 1st Qu.:2020-11-02 1st Qu.: 330
## Median :2021-08-15 Median : 2272
## Mean :2021-08-15 Mean : 14088
## 3rd Qu.:2022-05-28 3rd Qu.: 8159
## Max. :2023-03-09 Max. :3710586
##
summary(us_deaths)
## Country_Short fips County Province_State
## Length:3688461 Min. : 60 Length:3688461 Length:3688461
## Class :character 1st Qu.:19023 Class :character Class :character
## Mode :character Median :30018 Mode :character Mode :character
## Mean :31337
## 3rd Qu.:46103
## Max. :72153
## NA's :1143
## Population dates deaths
## Min. : 86 Min. :2020-01-22 Min. : 0.0
## 1st Qu.: 11137 1st Qu.:2020-11-02 1st Qu.: 5.0
## Median : 26205 Median :2021-08-15 Median : 40.0
## Mean : 103153 Mean :2021-08-15 Mean : 189.5
## 3rd Qu.: 67493 3rd Qu.:2022-05-28 3rd Qu.: 126.0
## Max. :10039107 Max. :2023-03-09 Max. :35545.0
##
Global Cases and Global Deaths needed their dates columns unpivoted and renamed to “dates” and “cases”. Rename two columns to get rid of the “/”, convert the dates column from character to date type, and remove the “Lat” and “Long” columns.
global_cases <- global_cases %>%
pivot_longer(
cols = !c(1:4),
names_to = "dates",
values_to = "cases",
values_transform = as.numeric
) %>%
rename(Prov_State = 'Province/State', Country_Region = 'Country/Region') %>%
mutate(dates = mdy(dates)) %>%
select(-Lat, -Long)
global_deaths <- global_deaths %>%
pivot_longer(
cols = !c(1:4),
names_to = "dates",
values_to = "cases",
values_transform = as.numeric
) %>%
rename(Prov_State = 'Province/State', Country_Region = 'Country/Region', deaths = cases) %>%
mutate(dates = mdy(dates)) %>%
select(-Lat, -Long)
summary(global_cases)
## Prov_State Country_Region dates cases
## Length:330327 Length:330327 Min. :2020-01-22 Min. : 0
## Class :character Class :character 1st Qu.:2020-11-02 1st Qu.: 680
## Mode :character Mode :character Median :2021-08-15 Median : 14429
## Mean :2021-08-15 Mean : 959384
## 3rd Qu.:2022-05-28 3rd Qu.: 228517
## Max. :2023-03-09 Max. :103802702
summary(global_deaths)
## Prov_State Country_Region dates deaths
## Length:330327 Length:330327 Min. :2020-01-22 Min. : 0
## Class :character Class :character 1st Qu.:2020-11-02 1st Qu.: 3
## Mode :character Mode :character Median :2021-08-15 Median : 150
## Mean :2021-08-15 Mean : 13380
## 3rd Qu.:2022-05-28 3rd Qu.: 3032
## Max. :2023-03-09 Max. :1123836
The CDC data had summary rows, so removed them by filtering out “All Ages”, “All Sexes”, and “United States”. Remove unnecessary columns, and convert “month”, “year” and “covid_19_deaths” to numeric type.
stateDeaths_w_age_sex <- stateDeaths_w_age_sex %>%
select(-(data_as_of:group), -(total_deaths:footnote)) %>%
filter(!sex %in% c('All Sexes'), !age_group %in% c('All Ages'), !state %in% c('United States'), !covid_19_deaths %in% c(0)) %>%
drop_na(year, month, covid_19_deaths) %>%
mutate(month = as.numeric(month), year = as.numeric(year), covid_19_deaths = as.numeric(covid_19_deaths))
summary(stateDeaths_w_age_sex)
## state sex age_group covid_19_deaths
## Length:18282 Length:18282 Length:18282 Min. : 10.00
## Class :character Class :character Class :character 1st Qu.: 17.00
## Mode :character Mode :character Mode :character Median : 32.00
## Mean : 74.23
## 3rd Qu.: 73.00
## Max. :2944.00
## year month
## Min. :2020 Min. : 1.000
## 1st Qu.:2020 1st Qu.: 3.000
## Median :2021 Median : 7.000
## Mean :2021 Mean : 6.609
## 3rd Qu.:2022 3rd Qu.:10.000
## Max. :2023 Max. :12.000
Political party data set was filtered for only the by-state information.
state_PolParty <- state_PolParty %>%
select(state, called) %>%
filter(
!state %in% c("U.S. Total", "15 Key Battlegrounds", "Non-Battlegrounds"),
!grepl("1st District", state),
!grepl("2nd District", state),
!grepl("3rd District", state)
)
From my understanding, the deaths column isn’t a deaths by day, but more of a total deaths as of that day. Strangely, when calculating the delta, there are areas of negative deaths. While this is possible for the cases column, where some people have COVID (positive delta) and some recover (negative delta) or died (negative delta), there shouldn’t be instances of negative death (someone comes back to life). I will leave the negatives. If it was incorrectly entered as a new death and then retracted, the negative will cancel out the addition.
However, because of the lag function, the delta column is incorrect in the following case: row x has County-A Dec 31st, 2022 with 500 deaths, but row x=1 is now County-B Jan 01, 2020 with 0 deaths, the delta logs this as -500 deaths. Got around this issue by using ifelse to check for the changes (i.e. if there is a change, the deaths delta is set to zero).
usCovid_delta <- us_cases %>%
full_join(us_deaths) %>%
filter(cases != 0) %>%
mutate( PrevCases = lag(cases, n = 1),
PrevDeaths = lag(deaths, n = 1),
NewCases = ifelse(
County == lag(County, n = 1),
cases - PrevCases,
0),
NewDeaths = ifelse(
County == lag(County, n = 1),
deaths - PrevDeaths,
0)
) %>%
select(-PrevCases, -PrevDeaths)
global <- global_cases %>%
full_join(global_deaths) %>%
group_by(Prov_State, Country_Region, year(dates), month(dates)) %>%
summarise(cases = max(cases), deaths = max(deaths))
#use this for the modeling
stateDeaths_w_age_sex <- stateDeaths_w_age_sex %>%
full_join(state_PolParty)
The following is a heatmap showing the mean number of deaths by age by state for the US. Texas and California stand out, specifically in the higher age groups.
ageGroup_heatmap <- stateDeaths_w_age_sex %>%
group_by(state, age_group) %>%
summarise(mean_deaths = mean(covid_19_deaths))
plot_ly(
ageGroup_heatmap,
x = ~age_group,
y = ~state,
z = ~mean_deaths,
type = "heatmap"
) %>%
layout(title = 'Mean Deaths by State')
Next graph showcases the number of cases and deaths in the United States as a whole, from Jan 2020 to March 2023.
#From what I've seen in the data, case and death counts start in march for most states
usCovid_delta_total <- usCovid_delta %>%
filter(Country_Short == "US") %>%
group_by(Year = year(dates), Month = month(dates)) %>%
summarise(cases_delta = sum(NewCases, na.rm=T), deaths_delta = sum(NewDeaths, na.rm=T))
ggplot(usCovid_delta_total) +
geom_line(
aes(
x = as.factor(Month),
y = cases_delta,
group = as.factor(Year),
colour = "Cases"),
linewidth = 1) +
geom_line(
aes(
x = as.factor(Month),
y = deaths_delta,
group = as.factor(Year),
colour = "Deaths"),
linewidth = 1) +
facet_grid(.~Year, scales = "free") +
scale_y_continuous(trans = "log10", labels = comma) +
labs(x = "Month",y = "Volume") +
ggtitle("COVID Cases and Deaths in the USA") +
scale_color_manual(name = "COVID", values = c("Cases" = "blue", "Deaths" = "red"))
Create the same graph for COVID numbers globally.
global_delta_total <- global_cases %>%
full_join(global_deaths) %>%
filter(cases != 0) %>%
mutate( PrevCases = lag(cases, n = 1),
PrevDeaths = lag(deaths, n = 1),
NewCases = ifelse(
(Prov_State == lag(Prov_State, n = 1) | is.na(Prov_State)) & Country_Region == lag(Country_Region, n = 1),
cases - PrevCases,
0),
NewDeaths = ifelse(
(Prov_State == lag(Prov_State, n = 1) | is.na(Prov_State)) & Country_Region == lag(Country_Region, n = 1),
deaths - PrevDeaths,
0)
) %>%
select(-PrevCases, -PrevDeaths) %>%
filter(!((Country_Region == "France" & is.na(Prov_State)) | (Country_Region == "United Kingdom" & is.na(Prov_State))) & dates != 2020-01-22) %>%
group_by(year = year(dates), month = month(dates)) %>%
summarise(cases_delta = sum(NewCases, na.rm=T), deaths_delta = sum(NewDeaths, na.rm=T))
#Strange happenings: the lag function in the following code works for everything row except where United Arab Emirates becomes United Kingdom, and Finland becomes France. I cannot find the issue, and trying the Lag function from Hmisc has the same issue. Since they are the only two instances, I have purposely removed them.
ggplot(global_delta_total) +
geom_line(aes(x = as.factor(month), y = cases_delta, group = as.factor(year), colour = "Cases"), linewidth = 1) +
geom_line(aes(x = as.factor(month), y = deaths_delta, group = as.factor(year), colour = "Deaths"), linewidth = 1) +
facet_grid(.~year, scales = "free") +
scale_y_continuous(trans = "log10", labels = comma) +
labs(x = "Month",y = "Volume") +
ggtitle("COVID Cases and Deaths Globally") +
scale_color_manual(name = "COVID", values = c("Cases" = "blue", "Deaths" = "red"))
Globally, 2022 had the highest number of cases, but the largest gap between cases and deaths compared to the other years.At this time, many countries were well into the vaccine roll out, as observed by the World Health Organization. This could potentially explain how the case numbers jump at the beginning of 2022 (vaccination allowed for more socializing and fewer lock down restrictions, as seen here https://ourworldindata.org/covid-vaccinations)
From the heatmap, California, Texas and Florida had the highest mean deaths in the ageing population. They are the three most populous states of the US, as seen in the table below.
us_pops_byState <- us_deaths %>%
filter(Country_Short == "US") %>%
group_by(County, Province_State) %>%
summarise(Population = mean(Population)) %>%
group_by(Province_State) %>%
summarise( total_pop = sum(Population, na.rm=T)) %>%
arrange(desc(total_pop)) %>%
head(10)
knitr::kable(us_pops_byState, "simple", format.args = list(big.mark = ",",
scientific = FALSE), col.names = c("State", "Total Population"))
| State | Total Population |
|---|---|
| California | 39,512,223 |
| Texas | 28,995,881 |
| Florida | 21,477,737 |
| New York | 19,453,561 |
| Pennsylvania | 12,801,989 |
| Illinois | 12,671,821 |
| Ohio | 11,689,100 |
| Georgia | 10,617,423 |
| North Carolina | 10,488,084 |
| Michigan | 9,986,857 |
As seen in the table below, the counties with the highest single day increase in COVID cases and deaths were warmer-climate counties, where the population is more likely to be outdoors.
us_singleDayInc <- usCovid_delta %>%
filter(Country_Short == "US", !is.na(Population)) %>%
group_by(County, Province_State) %>%
summarise(
Population = mean(Population),
high_NewCases = max(NewCases, na.rm = T),
high_NewDeaths = max(NewDeaths, na.rm = T)) %>%
arrange(desc(high_NewCases), desc(high_NewDeaths)) %>%
head(10)
knitr::kable(us_singleDayInc, "simple", format.args = list(big.mark = ",",
scientific = FALSE), col.names = c("County", "State", "Population", "Max New Cases", "Max New Deaths"))
| County | State | Population | Max New Cases | Max New Deaths |
|---|---|---|---|---|
| Miami-Dade | Florida | 2,716,940 | 110,441 | 2,806 |
| San Diego | California | 3,338,330 | 52,300 | 79 |
| Broward | Florida | 1,952,778 | 50,254 | 1,913 |
| Los Angeles | California | 10,039,107 | 45,553 | 928 |
| Cook | Illinois | 5,150,233 | 41,289 | 167 |
| Maricopa | Arizona | 4,485,414 | 34,764 | 318 |
| Palm Beach | Florida | 1,496,770 | 34,340 | 1,446 |
| Orange | Florida | 1,393,452 | 30,752 | 988 |
| Clark | Nevada | 2,266,715 | 27,876 | 356 |
| Orange | California | 3,175,692 | 25,439 | 110 |
us_deathsBy_sex <- stateDeaths_w_age_sex %>%
group_by(sex) %>%
summarise(total_deaths = sum(covid_19_deaths)) %>%
mutate(deci = total_deaths / sum(total_deaths)) %>%
mutate(perc = scales::percent(deci))
us_deathsBy_age_group <- stateDeaths_w_age_sex %>%
group_by(age_group) %>%
summarise(total_deaths = sum(covid_19_deaths)) %>%
mutate(deci = total_deaths / sum(total_deaths)) %>%
mutate(perc = scales::percent(deci))
Diving deeper into the sex of those who died from COVID in the US, 43% were female (589,372 total) and 57% were male (767,725 total). Compare this to the total US population where 50.25% is male, 49.75% is female (as of 2023, World Bank Group https://data.worldbank.org/indicator/SP.POP.TOTL.FE.ZS?locations=US&view=map&year=2023)
Deaths by age group highlights the fact that people past retirement age are high-risk for infections and viruses. Approximately 63% of the deaths were people 65 years and older.
stateDeaths_w_age_sex <- na.omit(stateDeaths_w_age_sex)
model <- randomForest(
formula = covid_19_deaths ~ .,
data = stateDeaths_w_age_sex,
mtry = 6
)
print(model)
##
## Call:
## randomForest(formula = covid_19_deaths ~ ., data = stateDeaths_w_age_sex, mtry = 6)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 6
##
## Mean of squared residuals: 1957.847
## % Var explained: 89.38
stateDeaths_w_age_sex <- na.omit(stateDeaths_w_age_sex) %>%
mutate(pred = round(predict(model),2))
The mean squared residuals of our score means the model was off by this many deaths on average. The lower this number and the higher the % variance, the better.
#Compare model to whole US
us_v_Pred <- stateDeaths_w_age_sex %>%
group_by(year, month) %>%
summarise(deaths = sum(covid_19_deaths, na.rm = T), pred = sum(pred, na.rm = T))
ggplot(us_v_Pred) +
geom_line(aes(x = as.factor(month), y = deaths, group = as.factor(year), colour = "Actuals"), linewidth = 1) +
geom_line(aes(x = as.factor(month), y = pred, group = as.factor(year), colour = "Model"), linewidth = 1) +
facet_grid(.~year, scales = "free") +
scale_y_continuous(trans = "log10", labels = comma) +
labs(x = "Month",y = "Volume") +
ggtitle("COVID Deaths in the US: Prediction vs Actuals") +
scale_color_manual(name = "COVID", values = c("Actuals" = "steelblue", "Model" = "orange3"))
Visually, the model looks very close to the actuals. The next graph focuses on the state of California and will show it is not as clean when the model for the US as a whole is used state by state.
#compare model to actual for California
Calif_v_Pred <- stateDeaths_w_age_sex %>%
filter(state == "California") %>%
group_by(state, year, month) %>%
summarise(deaths = sum(covid_19_deaths, na.rm = T), pred = sum(pred, na.rm = T))
ggplot(Calif_v_Pred) +
geom_line(aes(x = as.factor(month), y = deaths, group = as.factor(year), colour = "Actuals"), linewidth = 1) +
geom_line(aes(x = as.factor(month), y = pred, group = as.factor(year), colour = "Model"), linewidth = 1) +
facet_grid(.~year, scales = "free") +
scale_y_continuous(trans = "log10", labels = comma) +
labs(x = "Month",y = "Volume") +
ggtitle("COVID Deaths in California: Prediction vs Actuals") +
scale_color_manual(name = "COVID", values = c("Actuals" = "steelblue", "Model" = "orange3"))
Sampling Bias: not having the same amount of info for other countries (and as easily available) as for the USA. Some countries may have been unable to or unwilling to properly report COVID cases and deaths.
As well, choosing to import the political party by state instead of the religious demographic spread or another feature can be seen as a bias.
Preprocessing bias: There is only so much data I can compile. At some point, I wanted to import the US census data, but the output had variable codes instead of the text names and would have taken much more time to clean. To avoid going down a rabbit hole and wasting time, I had to close off that route.
sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8
## [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Canada.utf8
##
## time zone: America/Toronto
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] randomForest_4.7-1.2 scales_1.3.0 plotly_4.10.4
## [4] jsonlite_1.8.9 httr_1.4.7 lubridate_1.9.4
## [7] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [10] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [13] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 generics_0.1.3 stringi_1.8.4 hms_1.1.3
## [5] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.3 grid_4.4.2
## [9] timechange_0.3.0 fastmap_1.2.0 crosstalk_1.2.1 viridisLite_0.4.2
## [13] lazyeval_0.2.2 jquerylib_0.1.4 cli_3.6.3 rlang_1.1.5
## [17] crayon_1.5.3 bit64_4.6.0-1 munsell_0.5.1 withr_3.0.2
## [21] cachem_1.1.0 yaml_2.3.10 parallel_4.4.2 tools_4.4.2
## [25] tzdb_0.4.0 colorspace_2.1-1 curl_6.2.0 vctrs_0.6.5
## [29] R6_2.5.1 lifecycle_1.0.4 htmlwidgets_1.6.4 bit_4.5.0.1
## [33] vroom_1.6.5 pkgconfig_2.0.3 pillar_1.10.1 bslib_0.8.0
## [37] gtable_0.3.6 glue_1.8.0 data.table_1.16.4 xfun_0.50
## [41] tidyselect_1.2.1 rstudioapi_0.17.1 knitr_1.49 farver_2.1.2
## [45] htmltools_0.5.8.1 rmarkdown_2.29 compiler_4.4.2
National Center for Health Statistics. Provisional COVID-19 Deaths by County, and Race and Hispanic Origin. Date accessed [2025-02-09]. https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-in-the-United-St/kn79-hsxy/data_preview
Johns Hopkins Center for Systems Science and Engineering. Novel Coronavirus (COVID-19) Cases. Date accessed [2025-02-09] https://github.com/CSSEGISandData/COVID-19
The Cook Political Report (with Amy Walter). Popular Vote Backend. Date accessed [2025-02-09]. https://www.cookpolitical.com/vote-tracker/2020/electoral-college
Access data with API Endpoint and JSON url
More than 1000 rows SODA API
Unpivot columns
https://datacornering.com/unpivot-data-in-r-with-the-pivot_longer-from-tidyr/
*Some unpivot cols are numeric, some are char https://stackoverflow.com/questions/71482059/pivot-longer-error-cant-combine-01-01-2020-character-and-03-01-2020-dou
Rename column using dplyr
Create new output of delta cases by day
Citing data sources
Import from Github using httr (I got a SSL Connect error):
Mutate based on conditional
Plotly Heatmap
Multi row x axis
Sum in Summarise showing NA values
Add a legend to ggplot with separate lines